2024-06-18
Question
What happens when multiple units adopt the intervention at different times?
Example
Different states adopted COVID-19 vaccine mandates for state employees at different times.
Multiple units, treated at different time points
Multiple time points
Caution
The TWFE model can accommodate this:
\[ Y_{it} = \alpha_i + \gamma_t + \theta I(X_{it} = 1)+\epsilon_{it} \]
But as written it assumes treatment effect homogeneity across time periods, time-on-treatment, and units.
In many settings, especially in epidemiology, heterogeneity is common, especially with non-randomized adoption.
Question
What might cause heterogeneity in the effect of a state employee COVID-19 vaccine mandate?
Goodman-Bacon (2021), Figure 1.
Goodman-Bacon (2021), Figure 3.
The weights on treatment effects can be non-convex (i.e., negative) if either of the following are true:
There are time-varying treatment effects
There are heterogeneous treatment effects across timing groups
This gives an uninterpretable estimand, and can even switch the sign of the estimate.
The TWFE model estimates a weighted average of all 2x2 DID comparisons.
Goodman-Bacon (2021), Figure 6.
We can also observe the overall weight given to each treatment timing group, which may be negative if it is more often used as a control than a treated group.
Goodman-Bacon (2021), Figure 7.
Let
\[ ATT(g,t) = E[Y_{it}(g) - Y_{it}(0)], \]
the group-time ATT in period \(t\) for a unit first treated in period \(g\), compared to if it had never been treated (or not yet treated by period \(t\)).
Many solutions boil down to considering which group-time ATTs should be included in the estimand, how they differ, and how to weight them.
TWFE assumes \(ATT(g,t) = \theta\) for all \(g \le t\).
\[ Y_{it} = \alpha_i + \gamma_t + \sum_{k \neq 0} \delta_{k} I(K_{it} = k)+\epsilon_{it}, \]
where \(K_{it}\) is the lead/lag for unit \(i\) in period \(t\) (e.g., \(K_{it} = 1\) in the first exposed period).
See Borusyak and Jaravel (2018) and Borusyak et al. (2024). Captures time-on-treatment heterogeneity.
Also useful to test for “pre-trends” in single intervention time setting.
We can account for timing cohort heterogeneity as well by further allowing the effect to vary by adoption timing group (\(G_i\)):
\[ Y_{it} = \alpha_i + \gamma_t + \sum_g \sum_{k \neq 0} \delta_{g,k} I(G_i = g) I(K_{it} = k) + \epsilon_{it} \]
Various methods use this approach, and differ in which comparisons/observations they allow and how they combine results. This implies different assumptions and bias-variance tradeoffs.
One approach to avoiding forbidden comparisons is restricting the observations used to fit the model.
Borusyak et al. (2024) fit this model using only the not-yet-treated observations. They then use that to derive counterfactual outcomes for comparison.
Sun and Abraham (2021) use the approach with a clean “control” \(C\) that is either never-treated or last-treated. Their regression approach then implicitly weights by population share in each timing group.
Another approach would only consider one switching effect. For each time period \(t\) with at least one unit untreated at \(t-1\) and treated at \(t\) and at least one unit untreated at both \(t-1\) and \(t\), compute:
\[ \begin{align*} \widehat{DID}_{+,t} = \frac{1}{N_{1,0,t}} &\sum_{i:D_{i,t}=1,D_{i,t-1}=0} \left( Y_{i,t} - Y_{i,t-1} \right) \\ &- \frac{1}{N_{0,0,t}} \sum_{i:D_{i,t}=D_{i,t-1}=0} \left( Y_{i,t} - Y_{i,t-1} \right) \end{align*} \]
Average these switcher estimates across all time periods \(t\), weighted by number of units or individuals.
See de Chaisemartin and d’Haultfoeuille (2020) and de Chaisemartin and d’Haultfoeuille (2023).
Callaway and Sant’Anna (2021) propose to estimate \(\widehat{ATT}_{g,t}\) for each timing group \(g\) and period \(t\) using a non-parametric scheme compared to the last pre-treatment period: suggested approaches are IPW, OR, and DR.
Then summarize to an overall average effect weighted by \(w_{g,t}\):
\[ \theta = \sum_g \sum_{t=2}^T w_{g,t} ATT_{g,t}. \]
More generally, can weight across all 2x2 DID comparisons, with weights chosen to target a specific estimand and then minimize variance.
\[ \hat{\theta} = \sum_{i,i',t,t'} w_{i,i',t,t'} \left[ \left( Y_{i,t'} - Y_{i,t} \right) - \left( Y_{i',t'} - Y_{i',t} \right) \right] \]
Review/survey papers:
These are complicated by staggered adoption and the longer time frames implied by panel data. Recent work has focused on how to interpret and test for these assumptions and how to incorporate time-varying covariates.
The no-anticipation (or known/limited anticipation) assumption still must hold, as must the no-spillover assumption.
All of these approaches change the precise specification of the estimand as well: the ATT must be interpreted in terms of the included time periods, lags, and units, and how they are weighted.
Important
It’s easy to ignore the fundamentals when using the more advanced methods. Consider the validity of the data, the question being asked, and the feasibility of the effect.
Consider data source carefully
Think about possible heterogeneities and desired estimand
Use graphical displays and diagnostics to assess possible biases and trade-offs
Consider multiple estimation methods for robustness to different assumptions
Pre-specify, and explore with appropriate caveats